Search CORE

13 research outputs found

Locating Cache Performance Bottlenecks Using Data Profiling

Author: Morris Robert Tappan
Pesterev Aleksey
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may be spread over instructions throughout the application. The resulting individually small costs at a large number of instructions can easily appear insignificant in a code profiler's output. DProf helps programmers understand cache miss costs by attributing misses to data types instead of code. Associating cache misses with data helps programmers locate data structures that experience misses in many places in the application's code. DProf introduces a number of new views of cache miss data, including a data profile, which reports the data types with the most cache misses, and a data flow graph, which summarizes how objects of a given type are accessed throughout their lifetime, and which accesses incur expensive cross-CPU cache loads. We present two case studies of using DProf to find and fix cache performance bottlenecks in Linux. The improvements provide a 16-57% throughput improvement on a range of memcached and Apache workloads.MathWorks, Inc. FellowshipNational Science Foundation (U.S.). (Grant number CNS-0834415

CiteSeerX

DSpace@MIT

Crossref

Cache craftiness for fast multicore key-value storage

Author: Kohler Eddie
Mao Yandong
Morris Robert Tappan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2012
Field of study

We present Masstree, a fast key-value database designed for SMP machines. Masstree keeps all data in memory. Its main data structure is a trie-like concatenation of B+-trees, each of which handles a fixed-length slice of a variable-length key. This structure effectively handles arbitrary-length possiblybinary keys, including keys with long shared prefixes. [superscript +]-tree fanout was chosen to minimize total DRAM delay when descending the tree and prefetching each tree node. Lookups use optimistic concurrency control, a read-copy-update-like technique, and do not write shared data structures; updates lock only affected nodes. Logging and checkpointing provide consistency and durability. Though some of these ideas appear elsewhere, Masstree is the first to combine them. We discuss design variants and their consequences. On a 16-core machine, with logging enabled and queries arriving over a network, Masstree executes more than six million simple queries per second. This performance is comparable to that of memcached, a non-persistent hash table server, and higher (often much higher) than that of VoltDB, MongoDB, and Redis.National Science Foundation (U.S.). (Award 0834415)National Science Foundation (U.S.). (Award 0915164)Quanta Computer (Firm

DSpace@MIT

Harvard University - DASH

UFlood: High-throughput flooding over wireless mesh networks

Author: Balakrishnan Hari
Morris Robert Tappan
Subramanian Jayashree
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This paper proposes UFlood, a flooding protocol for wireless mesh networks. UFlood targets situations such as software updates where all nodes need to receive the same large file of data, and where limited radio range requires forwarding. UFlood's goals are high throughput and low airtime, defined respectively as rate of completion of a flood to the slowest receiving node and total time spent transmitting. The key to achieving these goals is good choice of sender for each transmission opportunity. The best choice evolves as a flood proceeds in ways that are difficult to predict. UFlood's core new idea is a distributed heuristic to dynamically choose the senders likely to lead to all nodes receiving the flooded data in the least time. The mechanism takes into account which data nearby receivers already have as well as internode channel quality. The mechanism includes a novel bit-rate selection algorithm that trades off the speed of high bit-rates against the larger number of nodes likely to receive low bitrates. Unusually, UFlood uses both random network coding to increase the usefulness of each transmission and detailed feedback about what data each receiver already has; the feedback is critical in deciding which node's coded transmission will have the most benefit to receivers. The required feedback is potentially voluminous, but UFlood includes novel techniques to reduce its cost. The paper presents an evaluation on a 25-node 802.11 test-bed. UFlood achieves 150% higher throughput than MORE, a high-throughput flooding protocol, using 65% less airtime. UFlood uses 54% less airtime than MNP, an existing efficient protocol, and achieves 300% higher throughput.National Science Foundation (U.S.) (Grant CNS-0721702)Foxconn (Sponsorship

CiteSeerX

DSpace@MIT

Crossref

Improving network connection locality on multicore systems

Author: Morris Robert Tappan
Pesterev Aleksey
Strauss Jacob
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance. We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.National Science Foundation (U.S.). (Grant number CNS-0834415)National Science Foundation (U.S.). (Grant number CNS-0915164)Quanta Computer (Firm

CiteSeerX

DSpace@MIT

Crossref

Reinventing Scheduling for Multicore Systems

Author: Boyd-Wickizer Silas
Kaashoek M. Frans
Morris Robert Tappan
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2009
Field of study

High performance on multicore processors requires that schedulers be reinvented. Traditional schedulers focus on keeping execution units busy by assigning each core a thread to run. Schedulers ought to focus, however, on high utilization of on-chip memory, rather than of execution cores, to reduce the impact of expensive DRAM and remote cache accesses. A challenge in achieving good use of on-chip memory is that the memory is split up among the cores in the form of many small caches. This paper argues for a form of scheduling that assigns each object and its operations to a specific core, moving a thread among the cores as it uses different objects

CiteSeerX

DSpace@MIT

An Analysis of Linux Scalability to Many Cores

Author: Boyd-Wickizer Silas
Clements Austin T.
Kaashoek M. Frans
Mao Yandong
Morris Robert Tappan
Pesterev Aleksey
Zeldovich Nickolai
Publication venue: USENIX Association
Publication date: 01/10/2010
Field of study

URL to paper from conference siteThis paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters— these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.Quanta Computer (Firm)National Science Foundation (U.S.) (0834415)National Science Foundation (U.S.) (0915164)Microsoft Research (Fellowship)Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowshi

DSpace@MIT

Distributed Computer Systems

Author: Morris Robert Tappan
Publication venue
Publication date: 01/12/2002
Field of study

Abstractions and implementation techniques for design of distributed systems; server design, network programming, naming, storage systems, security, and fault tolerance. Readings from current literature. 6 Engineering Design Points

DSpace@MIT

Flexible, Wide-Area Storage for Distributed Systems with WheelFS

Author: Kaashoek M. Frans
Li Jinyang
Morris Robert Tappan
Pretzer Xavid F.
Sovran Yair
Stribling Jeremy
Zhang Irene
Publication venue: USENIX Association
Publication date: 01/04/2009
Field of study

WheelFS is a wide-area distributed storage system intended to help multi-site applications share data and gain fault tolerance. WheelFS takes the form of a distributed file system with a familiar POSIX interface. Its design allows applications to adjust the tradeoff between prompt visibility of updates from other sites and the ability for sites to operate independently despite failures and long delays. WheelFS allows these adjustments via semantic cues, which provide application control over consistency, failure handling, and file and replica placement. WheelFS is implemented as a user-level file system and is deployed on PlanetLab and Emulab. Three applications (a distributed Web cache, an email service and large file distribution) demonstrate that WheelFS's file system interface simplifies construction of distributed applications by allowing reuse of existing software. These applications would perform poorly with the strict semantics implied by a traditional file system interface, but by providing cues to WheelFS they are able to achieve good performance. Measurements show that applications built on WheelFS deliver comparable performance to services such as CoralCDN and BitTorrent that use specialized wide-area storage systems.National Science Foundation (U.S.) (Grant No. CNS-0720644)Microsoft Research AsiaTsinghua University (Beijing, China

CiteSeerX

DSpace@MIT

Modern Inorganic Aerogels

Author: Adam
Aghajamali
Alemán
Anderson
Antonietti
Araby
Arachchige
Arachchige
Arachchige
Bag
Bag
Bag
Bag
Baumann
Benad
Bethune
Biener
Bigall
Boehm
Bradshaw
Bryning
Béguin
Cai
Cai
Cai
Chan
Chen
Chen
Chen
Chen
Chen
Cheng
Cong
Coropceanu
Cracknell
da Silva
Davis
De Freitas
DeSario
Ding
Dorcheh
Douglas
Erlebacher
Evers
Fears
Feinle
Freytag
Ganguly
Ganguly
Gao
Gao
Gao
Gao
Gaponik
Georgakilas
Gill
Gill
Goldstein
Halámková
Heiligtag
Heiligtag
Heiligtag
Heiligtag
Heller
Helmi
Hendel
Herrmann
Hou
Hu
Hüsing
Iijima
Iijima
Inam
Jeong
Job
Jung
Kavanagh
Ke
Kistler
Korala
Korala
Korala
Korhonen
Kroto
Kuzyk
Kühn
Laberty-Robert
Lee
Lesnyak
Lesnyak
Leventis
Leventis
Leventis
Leventis
Leventis
Li
Lignos
Lin
Liu
Liu
Liu
Liu
Liu
Lohse
Mahadik-Khanolkar
Meador
Meena
Meinardi
Mohanan
Mohanan
Moreno-Castilla
Morris
Murray
Murray
Nahar
Nardecchia
Naskar
Newman
Novoselov
Oh
Oschatz
Otto
Ozouf
Pala
Pala
Park
Pierre
Pons
Presser
Qi
Rabis
Randall
Ranmohotti
Rechberger
Rechberger
Rechberger
Rechberger
Ren
Rengers
Resch-Genger
Riley
Riley
Riley
Roberts
Rolison
Rothemund
Russel
Sayevich
Sayevich
Schreiber
Schreiber
Sekiguchi
Shestopalov
Shibata
Shin
Shin
Si
Singh
Somorjai
Sorensen
Stamenkovic
Stamenkovic
Subrahmanyam
Subrahmanyam
Sui
Sun
Sun
Swearer
Sánchez-Paradinas
Tang
Tappan
Tian
Tigges
Viswanathan
Vlasov
Wagner
Wang
Wang
Wen
Wen
Wen
Wen
Willner
Wolf
Wu
Wu
Wu
Wu
Xiao
Xu
Xu
Yang
Yang
Yao
Yao
Yu
Yu
Yu
Yuan
Zeng
Zhang
Zhang
Zheng
Zheng
Zhu
Ziegler
Zou
Zu
Özaslan
Publication venue: 'Wiley'
Publication date: 15/05/2018
Field of study

Essentially, the term aerogel describes a special geometric structure of matter. It is neither limited to any material nor to any synthesis procedure. Hence, the possible variety of materials and therefore the multitude of their applications are almost unbounded. Here we present a comprehensive picture of the most promising developments in the field during the last decades

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Crossref

Technische Universität Dresden: Qucosa

Modern Inorganic Aerogels

Author: Adam
Aghajamali
Alemán
Anderson
Antonietti
Araby
Arachchige
Arachchige
Arachchige
Bag
Bag
Bag
Bag
Baumann
Benad
Bethune
Biener
Bigall
Boehm
Bradshaw
Bryning
Béguin
Cai
Cai
Cai
Chan
Chen
Chen
Chen
Chen
Chen
Cheng
Cong
Coropceanu
Cracknell
da Silva
Davis
De Freitas
DeSario
Ding
Dorcheh
Douglas
Erlebacher
Evers
Fears
Feinle
Freytag
Ganguly
Ganguly
Gao
Gao
Gao
Gao
Gaponik
Georgakilas
Gill
Gill
Goldstein
Halámková
Heiligtag
Heiligtag
Heiligtag
Heiligtag
Heller
Helmi
Hendel
Herrmann
Hou
Hu
Hüsing
Iijima
Iijima
Inam
Jeong
Job
Jung
Kavanagh
Ke
Kistler
Korala
Korala
Korala
Korhonen
Kroto
Kuzyk
Kühn
Laberty-Robert
Lee
Lesnyak
Lesnyak
Leventis
Leventis
Leventis
Leventis
Leventis
Li
Lignos
Lin
Liu
Liu
Liu
Liu
Liu
Lohse
Mahadik-Khanolkar
Meador
Meena
Meinardi
Mohanan
Mohanan
Moreno-Castilla
Morris
Murray
Murray
Nahar
Nardecchia
Naskar
Newman
Novoselov
Oh
Oschatz
Otto
Ozouf
Pala
Pala
Park
Pierre
Pons
Presser
Qi
Rabis
Randall
Ranmohotti
Rechberger
Rechberger
Rechberger
Rechberger
Ren
Rengers
Resch-Genger
Riley
Riley
Riley
Roberts
Rolison
Rothemund
Russel
Sayevich
Sayevich
Schreiber
Schreiber
Sekiguchi
Shestopalov
Shibata
Shin
Shin
Si
Singh
Somorjai
Sorensen
Stamenkovic
Stamenkovic
Subrahmanyam
Subrahmanyam
Sui
Sun
Sun
Swearer
Sánchez-Paradinas
Tang
Tappan
Tian
Tigges
Viswanathan
Vlasov
Wagner
Wang
Wang
Wen
Wen
Wen
Wen
Willner
Wolf
Wu
Wu
Wu
Wu
Xiao
Xu
Xu
Yang
Yang
Yao
Yao
Yu
Yu
Yu
Yuan
Zeng
Zhang
Zhang
Zheng
Zheng
Zhu
Ziegler
Zou
Zu
Özaslan
Publication venue: 'Wiley'
Publication date
Field of study

Crossref